On delayed prediction of individual sequences

نویسندگان

  • Marcelo J. Weinberger
  • Erik Ordentlich
چکیده

Prediction of individual sequences is investigated for cases in which the decision maker observes a delayed version of the sequence or is forced to issue his her predictions a number of steps in advance with incomplete information For nite action and observation spaces it is shown that the prediction strategy that minimizes the worst case regret with respect to the Bayes envelope is obtained through sub sampling of the sequence of observations The result extends to the case of logarithmic loss For nite state reference prediction strategies the delayed nite state predictability is de ned and related to its non delayed counterpart As in the non delayed case an e cient on line decision algorithm based on the incremental parsing rule is shown to perform in the long run essentially as well as the best nite state strategy determined in hindsight with full knowledge of the given sequence of observations An application to adaptive prefetching in computer memory architectures is discussed Index Terms Delayed prediction sequential decision on line algorithms general loss func tions Lempel Ziv algorithm Parts of this paper were presented at the Data Compression Conference Snowbird Utah USA Work partially done while this author was with Hewlett Packard Laboratories Palo Alto California Introduction The problem of predicting a binary sequence xn x x xn with the goal of achieving an expected number of prediction errors or loss that approaches the loss of the best constant predictor has received considerable attention over the last ve decades Here the expectation is with respect to a possible randomization in the prediction strategy and the loss of the best constant predictor is given by the Bayes envelope min n x n n x n where na x n denotes the number of occurrences in xn of a f g The problem was rst studied in the framework of the sequential decision problem and the approachability excludability theory The minimax strategy which minimizes the worst case regret i e the excess loss over the Bayes envelope over all n tuples was devised by Cover Other predictors were proposed in in a context where the competing reference strategy was nite state FS rather than constant and in and in the context of prediction with expert advice The worst case normalized regret of all these strategies vanishes at an O p n rate In particular Cover s minimax scheme yields the same regret over all sequences its main asymptotic term being p n The usual setting in prediction problems is that the on line decision maker observes a pre x x x xt of xn for each time instant t t n we assume the horizon n is known and makes a prediction pt jxt This prediction can be interpreted as the probability of choosing in a randomized selection of the next bit xt Thus the expected loss takes the form pt xt jxt However in many applications of practical interest the on line decision maker has access to a delayed version of the sequence or is forced to make inferences on the observations a number of instants in advance Such situations may arise when the application of the prediction is delayed relative to the observed sequence due to e g computational constraints The delay d which is assumed known a ects the prediction strategy in that the prediction for xt is now based on x x xt d only Since every such predictor is a particular case of a non delayed one the achievable performance under any performance metric cannot improve On the other hand the delay does not a ect the performance of a constant predictor so that the Bayes envelope is still our targeted loss The question arises How badly can the worst case regret be a ected by this delay At rst glance it would appear that the e ect of the delay is asymptotically negligible mainly because the setting of competing against a constant strategy for a given individual sequence is often associated to a probabilistic setting in which the data are drawn from a memoryless source For a memoryless source the expected loss incurred at time t for delayed prediction is the same as the expected loss that the predictor would incur without delay at time t d In addition for an individual sequence as t grows the window of d hidden bits cannot signi cantly a ect the statistics Therefore one would be inclined to ignore the delay and apply any of the above prediction schemes namely use at time t the same probability that the non delayed predictor would have used at time t d As shown in Appendix A application of the minimax strategy of in such a manner indeed yields vanishing regret for all sequences but it results in an asymptotic worst case regret d times higher than in the non delayed case It is also shown in Appendix A that for a similar strategy based on the exponential weighting algorithm of and the worst case normalized regret behaves asymptotically as p d ln n thus the multiplicative factor over the d case is p d The above additional regret due to the delay is immediately seen to be too high once we realize that a simple sub sampling strategy used in conjunction with any of the above schemes for non delayed prediction yields a multiplicative factor of only p d in the worst case regret Speci cally if we sub sample the original sequence xn at a rate d and process the resulting d sub sequences separately each sample xt is predicted based only on previous symbols xj such that j t mod d Therefore any non delayed scheme applied to each sub sequence will satisfy the delay constraint for the original sequence since the last symbol in the relevant sub sequence is xt d Now the sum of the Bayes envelopes corresponding to each sub sequence is not larger than the Bayes envelope of the entire sequence and therefore an upper bound on the total regret is at most d times the upper bound corresponding to each sub sequence Since the length of each sub sequence is about n d and the regret grows as the square root of the sequence length the upper bound is multiplied by p d It may be somewhat surprising that a scheme that ignores most of the samples at each individual step due to sub sampling has a better worst case performance than the same pre diction strategy based on the entire past sequence without the d hidden symbols Even One reason for obtaining a smaller factor than with the minimax strategy is that the exponential weighting algorithm has a weighting parameter denoted in which can be optimized taking into account the value of d But even with the parameter value that would be selected without delay the factor remains smaller than for the minimax strategy namely d more surprising is the fact that as shown in this paper this simple strategy when used in conjunction with the non delayed minimax scheme is indeed minimax for all n Moreover when n is a multiple of d this result is shown for more general prediction games in which the sequence of observations belongs to some nite alphabet A and corresponding actions b b bn taken from an action space B result in instantaneous losses bt xt where denotes a non negative function In such games the instantaneous loss contributions from each action observation pair yield a cumulative loss Lb x n X

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prediction of the Products Yield of Delayed Coking for Iranian Vacuum Residues

In this work, new correlations are proposed to predict the products yield of delayed coking as a function of CCR and temperature based on the experimental results. For this purpose, selected Iranian vacuum residues with Conradson carbon residue (CCR) values between 13.40-22.19 wt.% were heated at a 10 °C/min heating rate and thermally cracked in a temperature range of 400-500 °C in a laboratory...

متن کامل

Individual and Collaborative Output Tasks: Effects on the Acquisition of English Inversion Structures

This study investigated the effectiveness of individual and collaborative outputbasedfocus-on-form instructional tasks on the acquisition of English inversionstructures by EFL learners. Moreover, it explored the developmental trend oflearners’ inversion acquisition. To this end, 60 homogeneous EFL learners wereassigned to individual and collaborative output groups. They were exposed to textsenc...

متن کامل

Evaluation of First and Second Markov Chains Sensitivity and Specificity as Statistical Approach for Prediction of Sequences of Genes in Virus Double Strand DNA Genomes

Growing amount of information on biological sequences has made application of statistical approaches necessary for modeling and estimation of their functions. In this paper, sensitivity and specificity of the first and second Markov chains for prediction of genes was evaluated using the complete double stranded  DNA virus. There were two approaches for prediction of each Markov Model parameter,...

متن کامل

Protein Secondary Structure Prediction: a Literature Review with Focus on Machine Learning Approaches

DNA sequence, containing all genetic traits is not a functional entity. Instead, it transfers to protein sequences by transcription and translation processes. This protein sequence takes on a 3D structure later, which is a functional unit and can manage biological interactions using the information encoded in DNA. Every life process one can figure is undertaken by proteins with specific functio...

متن کامل

Long-Term Effects of Collaborative Task Planning vs. Individual Task Planning on Persian-Speaking EFL Learners’ Writing Performance

This study was aimed to compare long-term effects of collaborative and individual task planning on Persian-speaking EFL learners’ writing performance, using Brown and Bailey’s (1985) rating scale. Therefore, a group of 90 upper-intermediate EFL learners in collaborative task planning, individual task planning, and control groups took part in the study. In the experimental groups, the participan...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE Trans. Information Theory

دوره 48  شماره 

صفحات  -

تاریخ انتشار 2002